Automatic Alignment of English-Chinese Bilingual Texts of CNS News

نویسندگان

  • Donghua Xu
  • Chew Lim Tan
چکیده

In this paper we address a method to align EnglishChinese bilingual news reports from China News Service, combining both lexical and statistical approaches. Because of the sentential structure differences between English and Chinese, matching at the sentence level as in many other works may result in frequent matching of several sentences en masse. In view of this, the current work also attempts to create shorter alignment pairs by permitting finer matching between clauses from both texts if possible. The current method is based on statistical correlation between sentence or clause length of both texts and at the same time uses obvious anchors such as numbers and place names appearing frequently in the news reports as lexical cues.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Aligning and Matching of English-Chinese Bilingual Texts of CNS News

This paper presents a project to align and match English-Chinese bilingual news files downloaded from China News Service’s website. The work involves the alignment of bilingual texts at the sentence and clause levels. It addition, the work also requires matching of files as the English and Chinese news files downloaded from the web do not come in the same sequential order. These news files have...

متن کامل

Aligning Parallel English-chinese Texts Statistically with Lexical Criteria

We describe our experience with automatic alignment of sentences in parallel English-Chinese texts. Our report concerns three related topics: (1) progress on the HKUST English-Chinese Parallel Bilingual Corpus; (2) experiments addressing the applicability of Gale & Church's (1991) length-based statistical method to the task of alignment involving a non-Indo-European language; and (3) an improve...

متن کامل

Creating a Reusable English-Chinese Parallel Corpus for Bilingual Dictionary Construction

This paper first describes an experiment to construct an English-Chinese parallel corpus, then applying the Uplug word alignment tool on the corpus and finally produce and evaluate an English-Chinese word list. The Stockholm English-Chinese Parallel Corpus (SEC) was created by downloading English-Chinese parallel corpora from a Chinese web site containing law texts that have been manually trans...

متن کامل

Aligning a Parallel English-Chinese Corpus Statistically with Lexical Criteria

We describe our experience with automatic alignment of sentences in parallel English-Chinese texts. Our report concerns three related topics: (1) progress on the HKUST English-Chinese Parallel Bilingual Corpus; (2) experiments addressing the applicability of Gale & Church's (1991) length-based statistical method to the task of alignment involving a non-Indo-European language; and (3) an improve...

متن کامل

Full-text story alignment models for Chinese-English bilingual news corpora

In this paper, we describe the full-text story alignment on Chinese-English bilingual corpora of news data to mine potential parallel data for machine translation. Several standard information retrieval methods are tested and two translation-model based alignment models are proposed and studied. Modeling the process of generating the parallel English story from Chinese story gives significant i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره cmp-lg/9608017  شماره 

صفحات  -

تاریخ انتشار 1996